Iteration Mapping: Loop Software Pipelining on an XIMD
نویسندگان
چکیده
The multiple instruction streams, low synchronization cost and synchronous nature of the XIMD (variable instruction stream, multiple data stream) architecture create an opportunity for a new architecture-compiler interface. As an extension to the VLIW (Very Long Instruction Word) architecture, the XIMD can exploit all VLIW scheduling techniques but these do not take full advantage of the unique features of the XIMD. A new loop scheduling method for the XIMD, Iteration Mapping, is proposed that can exceed the performance of VLIW loop scheduling techniques on an XIMD. The medium-grained parallelism between loop iterations has been selected as the target for the first XIMD compiler implementation. This paper discusses how the relationship between the characteristics of loops and the architecture affect scheduling, presents the Iteration Mapping scheduling technique, presents performance data for loops scheduled using this new technique and compares it with recent results on software pipelining for the VLIW, and demonstrates the applicability of the technique for loop-intensive code. The weighted harmonic mean of the speedup for 4 functional units on the first fourteen Livermore Loops is currently 3.5, and the weighted harmonic mean of the speedup for selected loops that Iteration Mapping is particularly well suited for on 16 functional units is 14.5, more than 90% of linear speedup. The concepts behind Iteration Mapping are clean and understandable, leading to a compiler that is straight-forward, fast, and easy to implement.
منابع مشابه
Balancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture
This paper presents an approach to scheduling loops that leverages the distinctive architectural features of the XIMD, particularly the variable number of instruction streams and low synchronization cost. The classical VLIW and MIMD architectures have a fixed number of instruction streams, each with a fixed width. A compiler for the XIMD architecture can exploit fine-grained parallelism within ...
متن کاملCompiling Regular Computations to Fine-Grained Linear Processor Arrays
Fine-grained linear processor arrays are an important class of architectures for obtaining high performance on computationally intensive algorithms with large data sets, as found prevalently in digital signal processing and scientiic computing. The vast number of processing elements on these architectures provides a immense amount of potential parallelism but at the price of limited interconnec...
متن کاملEnhanced Loop Flattening for Software Pipelining of Arbitrary Loop Nests
This paper introduces and evaluates enhanced loop flattening, a compiler framework for transforming an arbitrary set of nested and sequenced loops into a single loop with additional logic to implement the control flow of the original code. Loop flattening allows conventional software pipelining algorithms to effectively pipeline nested loops. Conventional software pipelining approaches are only...
متن کاملA Software Pipelining Method Based on a Hierarchical Social Algorithm
Software pipelining is a compile-time scheduling technique that overlaps successive loop iterations to achieve instruction-level parallelism. It allows us to hide memory latency by overlapping the prefetches for a future iteration with the computation of the current iteration. This paper presents an efficient algorithm for determining the iteration bound of cyclic data flow graphs and the optim...
متن کاملSoftware pipelining of nested loops
This paper presents an approach to software pipelining of nested loops. While several papers have addressed software pipelining of inner loops, little work has been done in the area of extending it to nested loops. This paper solves the problem of nding the minimum iteration initiation interval (in the absence of resource constraints) for each level of a nested loop. The problem is formulated a...
متن کامل